A Metric Approach to Building Decision Trees Based on Goodman-Kruskal Association Index
نویسندگان
چکیده
We introduce a numerical measure on sets of partitions of finite sets that is linked to the Goodman-Kruskal association index commonly used in statistics. This measure allows us to define a metric on such partions used for constructing decision trees. Experimental results suggest that by replacing the usual splitting criterion used in C4.5 by a metric criterion based on the Goodman-Kruskal coefficient it is possible, in most cases, to obtain smaller decision trees without sacrificing accuracy.
منابع مشابه
Goodman-Kruskal Measure of Association for Fuzzy-Categorized Variables
The Goodman–Kruskal measure, which is a well-known measure of dependence for contingency tables, is generalized to the case when the variables of interest are categorized by linguistic terms rather than crisp sets. In addition, to test the hypothesis of independence in such contingency tables, a novel method of decision making is developed based on a concept of fuzzy p-value. The applicability ...
متن کاملPartial Association Components in Multi-way Contingency Tables and Their Statistiical Analysis
In analyses of contingency tables made up of categorical variables, the study of relationship between the variables is usually the major objective. So far, many association measures and association models have been used to measure the association structure present in the table. Although the association measures merely determine the degree of strength of association between the study varia...
متن کاملA new metric splitting criterion for decision trees
We examine a new approach to building decision tree by introducing a geometric splitting criterion, based on the properties of a family of metrics on the space of partitions of a finite set. This criterion can be adapted to the characteristics of the data sets and the needs of the users and yields decision trees that have smaller sizes and fewer leaves than the trees built with standard methods...
متن کاملA less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification
The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct nearoptimal decision trees. Most of them, however, are greedy algorithms that have the drawback of obtaining only local optimums. Besides, conventional split criteria they used, e.g. Shannon ...
متن کاملA Multi-Criteria Decision-Making Approach with Interval Numbers for Evaluating Project Risk Responses
The risk response development is one of the main phases in the project risk management that has major impacts on a large-scale project’s success. Since projects are unique, and risks are dynamic through the life of the projects, it is necessary to formulate responses of the important risks. Conventional approaches tend to be less effective in dealing with the imprecise of the risk response deve...
متن کامل